Search CORE

8 research outputs found

Simultaneous floating-point sine and cosine for VLIW integer processors

Author: Jeannerod Claude-Pierre
Jourdan-Lu Jingyan
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

Accepted for publication in the proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2012).International audienceGraphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. We describe software implementations for sinf, cosf, and sincosf over [-pi/4,pi/4] that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Non-generic floating-point software support for embedded media processing

Author: Jeannerod Claude-Pierre
Jourdan-Lu Jingyan
Monat Christophe
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThis paper presents some work in progress on the design and implementation of efficient floating-point software support for embedded integer processors. We provide quantitative evidence of the benefits of supporting various non-generic (that is, specialized, fused, or simultaneous) operations in addition to the five basic arithmetic operations: for individual calls, speedups range from 1.12 to 4.86, while on DSP kernels and benchmarks, our approach allows us to be up to 1.34x faster

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

How to square floats accurately and efficiently on the ST231 integer processor

Author: Jeannerod Claude-Pierre
Jourdan-Lu Jingyan
Monat Christophe
Revy Guillaume
Publication venue: HAL CCSD
Publication date: 19/11/2010
Field of study

We consider the problem of computing IEEE floating-point squares by means of integer arithmetic. We show how the specific properties of squaring can be exploited in order to design and implement algorithms that have much lower latency than those for general multiplication, while still guaranteeing correct rounding. Our algorithm descriptions are parameterized by the floating-point format, aim at high instruction-level parallelism (ILP) exposure, and cover all rounding modes. We show further that their C implementation for the binary32 format yields efficient codes for targets like the ST231 VLIW integer processor from STMicroelectronics, with a latency at least 1.75x smaller than that of general multiplication in the same context

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Simultaneous floating-point sine and cosine for VLIW integer processors

Author: Jeannerod Claude-Pierre
Jourdan-Lu Jingyan
Publication venue: HAL CCSD
Publication date: 09/07/2012
Field of study

Hal-Diderot

Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors

Author: Bertin Christian
Jeannerod Claude-Pierre
Jourdan-Lu Jingyan
Knochel Hervé
Monat Christophe
Mouilleron Christophe
Muller Jean-Michel
Revy Guillaume
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/07/2010
Field of study

International audienceRecently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) processor. We review here the techniques and software tools used or developed for this design and its implementation, and how they allowed very high instruction-level parallelism (ILP) exposure. Those key points include a hierarchical description of function evaluation algorithms, the exploitation of the standard encoding of floating-point data, the automatic generation of fast and accurate polynomial evaluation schemes, and some compiler optimizations

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot